Self-localization Using Visual Experience Across Domains

نویسندگان

  • Taisho Tsukamoto
  • Kanji Tanaka
چکیده

— In this study, we aim to solve the single-view robot self-localization problem by using visual experience across domains. Although the bag-of-words method constitutes a popular approach to single-view localization, it fails badly when it's visual vocabulary is learned and tested in different domains. Further, we are interested in using a cross-domain setting, in which the visual vocabulary is learned in different seasons and routes from the input query/database scenes. Our strategy is to mine a cross-domain visual experience, a library of raw visual images collected in different domains, to discover the relevant visual patterns that effectively explain the input scene, and use them for scene retrieval. In particular, we show that the appearance and the pose of the mined visual patterns of a query scene can be efficiently and discriminatively matched against those of the database scenes by employing image-to-class distance and spatial pyramid matching. Experimental results obtained using a novel cross-domain dataset show that our system achieves promising results despite our visual vocabulary being learned and tested in different domains. I. INTRODUCTION In this study, we aim to solve the problem of cross-domain single-view robot self-localization. For solving SLAM and other similar problems in mobile robotics, visual localization is crucial [1]–[3], [5]. While self-localization can be done either by using prior knowledge of the problem domain [1] or without them [2], we deal with applications and scenes where a collection of visual images from different domains, termed cross-domain visual experience, is available as prior knowledge. At the same time, we require the localization algorithm to be extremely fast (to work in a fast robot navigation) and to recognize the place from a single frame [1] (i.e., without temporal tracking [7] and visual sequence measurements [8]). One of most popular approaches to address the problem of single-view localization is bag-of-words methods [1], [5], [10], wherein a collection of local invariant visual features is extracted from an input image, and each feature is translated into a visual word by using a pre-learned library of vector-quantized features. Consequently, an input scene image is described compactly and discriminatively as an unordered collection of visual words (" bag-of-words "). However, as argued by several authors, the bag-of-words method fails badly when learned and tested in different domains; the main reasons include the following: (1) Because the bag-of-words method ignores all information about the spatial layout of the features, it limits the descriptive ability …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Image alignment via kernelized feature learning

Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...

متن کامل

Multisensory integration across exteroceptive and interoceptive domains modulates self-experience in the rubber-hand illusion

Identifying with a body is central to being a conscious self. The now classic "rubber hand illusion" demonstrates that the experience of body-ownership can be modulated by manipulating the timing of exteroceptive (visual and tactile) body-related feedback. Moreover, the strength of this modulation is related to individual differences in sensitivity to internal bodily signals (interoception). Ho...

متن کامل

American parent perspectives on quality of life in pediatric cochlear implant recipients.

OBJECTIVES Cochlear implantation influences not only communication but also psychosocial outcomes in children with severe to profound hearing loss. Focusing on issues specific to cochlear implantation (e.g., self-reliance, social relations, education, effects of implantation, and supporting the child) may provide a more accurate and relative view of functional status of pediatric cochlear impla...

متن کامل

Attentional Resource Allocation in Visuotactile Processing Depends on the Task, But Optimal Visuotactile Integration Does Not Depend on Attentional Resources

Humans constantly process and integrate sensory input from multiple sensory modalities. However, the amount of input that can be processed is constrained by limited attentional resources. A matter of ongoing debate is whether attentional resources are shared across sensory modalities, and whether multisensory integration is dependent on attentional resources. Previous research suggested that th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1509.07618  شماره 

صفحات  -

تاریخ انتشار 2015